Week 12 Monday¶
Monday, Nov 11, 2024
Announcements
- A08 (project proposal) due Wednesday, 11/13
- A09 Assigned: Due Monday, Dec 2
Goals
- Introduction to Image Processing
- In-Class exercise work
Introduction to Image Processing¶
Cameras are a huge part of modern robotics. We will not cover anywhere near "all" of the topic of image processing. What you will see in ES302 is just a "sampler" of some of the basic "pieces" that make up image processing "pipelines."
.
.
.
.
.
What is an image?¶
A greyscale image is nothing more than a matrix of values corresponding to the "brightness" measured by a particular camera pixel. We infer information from images all the time, but that's all they "really" are.
from numpy import *
from matplotlib.pyplot import *
a = zeros((10,10))
a[2,2]=255
a[2,8] = 255
a[6,2]=a[7,3]=a[8,4]=a[8,5]=a[8,6]=a[7,7]=a[6,8]=125
imshow(a,cmap='gray', vmin=0, vmax=255)
<matplotlib.image.AxesImage at 0x7fcb10712b50>
.
.
.
.
.
What is a color image?¶
A color image is the same, but normally its dimensions are n rows x m columns x 3 "channels." The three "channels" are blue, green, and red.
from numpy import *
from matplotlib.pyplot import *
im = imread('figures/RomiFrame.jpg')
print("image size: ")
print(im.shape)
imshow(im)
image size: (480, 640, 3)
<matplotlib.image.AxesImage at 0x7fcb207cddf0>
.
.
.
.
.
What information can you extract from this image?¶
<matplotlib.image.AxesImage at 0x7fcb40dd7e80>
.
.
.
.
.
Techniques for image processing help a robot extract information¶
There are lots of great techniques in "traditional" image processing, and modern (deep learning, CNN) based image processing techniques. We will focus on three aspects of "traditional" image processing, because these are often implicitly used by "modern" machine learning and AI techniques.
- Brightness
- thresholding an image to show only "bright" or only "dark" objects
- Edge information
- Directions of edges
- "corners"
- Color information
- "blobs" of certain colors
- "masking" or thresholding to isolate objects of a certain color
Converting color images to greyscale¶
- turn color image into a greyscale image for processing (edges, brightness, etc)
Manually, with numpy, by using the 2-norm of each color channel:
#get image size
rows,cols,depth = im.shape
#create a grey image as a placeholder
imgrey = zeros((rows,cols))
#loop through and compute greyscale brightness
for i in range(0,rows):
for j in range(0,cols):
bright = 0
for k in range(0,3):
bright += (im[i,j,k])**2
bright = bright**.5
imgrey[i,j]=bright
imshow(imgrey,cmap='gray')
<matplotlib.image.AxesImage at 0x7fcb320cfd90>
.
.
.
.
.
Converting color images to greyscale (using opencv)¶
The OpenCV library has python bindings that can be installed on the command line using pip:
$ pip install opencv-python
Then you can use opencv's (much faster) built-in libraries for many of these tasks:
import cv2
imgrey = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY )
imshow(imgrey,cmap='gray')
<matplotlib.image.AxesImage at 0x7fcb408078b0>
.
.
.
.
.
Brightness Thresholding¶
- turn greyscale image into a binary image with either full brightness (255) or full darkness (0).
#anything brighter than thresh will be full brightness
thresh = 200
imbool = zeros(shape(imgrey))
for i in range(0,rows):
for j in range(0,cols):
imbool[i,j] = 255*(imgrey[i,j]>thresh)
subplot(1,3,1)
imshow(im)
subplot(1,3,2)
imshow(imgrey,cmap='gray')
subplot(1,3,3)
imshow(imbool,cmap='gray')
<matplotlib.image.AxesImage at 0x7fcb20a849a0>
.
.
.
.
.
Edge Detection¶
"edges" are really just places where the absolute value of the derivative of brightness (or color) with respect to pixel location is high... in other words, there is a change in intensity with respect to space. For example, we can find edges in the x and y directions as follows:
edgeX = zeros(shape(imgrey))
edgeY = zeros(shape(imgrey))
for i in range(1,rows):
for j in range(1,cols):
#compute X edges by approximating derivative as a difference
edgeX[i,j] = abs(imgrey[i,j]-imgrey[i,j-1])
edgeY[i,j] = abs(imgrey[i,j]-imgrey[i-1,j])
This method isn't great, and can be improved upon! For one thing, it's a derivative that looks at "direct neighbors" of a particular pixel, with magnifies noise.
/var/folders/_q/yt6sf9_x7zxgqbq20b7h34qw0000gn/T/ipykernel_75923/3165796777.py:7: RuntimeWarning: overflow encountered in scalar subtract edgeX[i,j] = abs(imgrey[i,j]-imgrey[i,j-1]) /var/folders/_q/yt6sf9_x7zxgqbq20b7h34qw0000gn/T/ipykernel_75923/3165796777.py:8: RuntimeWarning: overflow encountered in scalar subtract edgeY[i,j] = abs(imgrey[i,j]-imgrey[i-1,j])
Text(0.5, 1.0, 'Y edges')
.
.
.
.
.
Kernel-based operations¶
Operations like derivatives can also be conceptualized as a "convolution" of a "kernel" across an image. A "kernel" is a small (usually 3x3) matrix of "weights."
For convolution, The output image's value is computed by summing the multiplication of the kernel weights with the image pixels that are overlapped by the kernel.
An approximate x-derivative operator called the "X Sobel kernel" could be written as a 3x3 kernel: $$ K = \begin{bmatrix}-1 & 0 & 1 \\ -2 & 0 & 2 \\ -1 & 0 & 1 \end{bmatrix}$$
We slide this kernel across and down the image, computing the value of
Dx[i,j]for any rowiand columnj. $$ Dx[i,j] = \left| \frac{\sum_{ki=0}^{2}\sum_{kj=0}^2 K[ki,kj]\cdot I[i+ki-1,j+kj-1]}{F} \right| $$Dx represents the "output image" (an x-derivative in this case).
I represents the original image
ki and kj are row and column counters for the kernel itself.
F is either the sum of the kernel values if that sum is nonzero, or 1 otherwise
.
.
.
Generalized Convolution Function in Python¶
def myconv(I,K):
"""convolves for 3x3 kernels only"""
#compute F, the normalizer
F = sum(sum(K))
#compute image size
In,Im = I.shape
#initialize output image
Imout = zeros(shape(I))
for i in range(1,In-1): #kernel pad
for j in range(1,Im-1): #kernel pad
Ksum = 0 #kernel sum (will be placed in output)
for ki in range(0,3):
for kj in range(0,3):
Ksum+= I[k-1+ki,j-1+kj]*K[ki,kj]
Ksum = Ksum/F
if(Ksum<0):
Ksum=0
Imout[i,j] = Ksum
return Imout
Test Convolution Function with "Sobel" derivative operators¶
Kx = array([[-1,0,1],[-2,0,2],[-1,0,1]])
Ky = array([[-1,-2,-1],[0,0,0],[1,2,1]])
Dx = myconv(imgrey,Kx)
Dx = Dx*255/np.max(Dx)#normalize to get max of 255
Dy = myconv(imgrey,Ky)
Dy = Dy*255/np.max(Dy)
Text(0.5, 1.0, 'Y Sobel')
.
.
.
.
.
Other common kernels:¶
- Gaussian Blur: $$K = \frac{1}{16}\begin{bmatrix}1&2&1\\2&4&2\\1&2&1\end{bmatrix}$$
- sharpen $$K = \begin{bmatrix}0&-1&0\\-1&5&1\\0&-1&0\end{bmatrix}$$
Kernels can be applied repeatedly to enhance results. Convolutions are a huge part of modern Convolutional Neural Networks too!
.
.
.
.
.
Other Common Kernels¶
Text(0.5, 1.0, 'blurred')
.
.
.
.
.
In practice, opencv's convolution functions are much faster than pure python¶
import cv2
Dx_cv = cv2.filter2D(src=imgrey, ddepth=-1, kernel=Kx)
subplot(1,2,1)
imshow(Dx_cv,cmap='gray')
title('opencv implementation')
subplot(1,2,2)
imshow(Dx,cmap='gray')
title('python (our) implementation')
Text(0.5, 1.0, 'python (our) implementation')
.
.
.
.
.
OpenCV includes many "combination" algorithms that use convolution¶
- One important example is the canny edge filter, which finds edges (in all directions) and post-processes them (ultimately with a brightness threshold) to produce a binary "line image."
import cv2
can = cv2.Canny(imgrey,100,200)
subplot(1,2,1)
imshow(imgrey,cmap='gray')
title('greyscale')
subplot(1,2,2)
imshow(can,cmap='gray')
title('canny')
OpenCV includes many "combination" algorithms that use convolution¶
- One important example is the canny edge filter, which finds edges (in all directions) and post-processes them (ultimately with a brightness threshold) to produce a binary "line image."
Text(0.5, 1.0, 'canny')
.
.
.
.
.
Color Thresholding: RGB space¶
Often, a robot in a structured environment might "know" that an object of interest is of a particular color. The RGB "color space" that's standard for images isn't the best for isolating particular colors, but it's possible. A different approach to RGB thresholding is given here
import cv2
#simplest: find "greenest" objects using a threshold on the green channel,
#then ANDing this with thresholds on the red and blue
T,threshgreen = cv2.threshold(im[:,:,1],150,255,cv2.THRESH_BINARY)#greater than threshold are white
T,threshblue = cv2.threshold(im[:,:,0],100,255,cv2.THRESH_BINARY_INV)#less than threshold are white
T,threshred = cv2.threshold(im[:,:,0],100,255,cv2.THRESH_BINARY_INV)#less than threshold are white
#now use bitwise AND operation to AND all white regions... not very red, not very blue, but very green
thresh_greenonly = cv2.bitwise_and(cv2.bitwise_and(threshred,threshblue),threshgreen)
imshow(thresh_greenonly,cmap='gray')
<matplotlib.image.AxesImage at 0x7fcb212e4340>
.
.
.
.
.
Color Thresholding: HSV space¶
The "HSV" color space looks at color differently than RGB (BGR). The HSV space classifies a pixel into a "hue" (around a color wheel), a "saturation," and a "value" (intensity). This makes it easier to isolate colors in ways that are less lighting-dependent.
For "green" thresholding, we might choose upper and lower bounds in the HSV space as:
lower_color_bounds = np.array([90*255/360, 30, 30],dtype=np.ubyte)
upper_color_bounds = np.array([150*255/360,255,255],dtype=np.ubyte)
.
.
.
.
.
Thresholding with the HSV color space¶
lower_color_bounds = np.array([80*255/360, 0, 0],dtype=np.ubyte)
upper_color_bounds = np.array([110*255/360,255,255],dtype=np.ubyte)
#convert original image to HSV color space for easy thresholding
imhsv = cv2.cvtColor(im,cv2.COLOR_BGR2HSV)
#using inRange, we can threshold the image so we only see what's in our color boundary
hsv_bool = cv2.inRange(imhsv,lower_color_bounds,upper_color_bounds)
#"mask" original image so we can see the colors we included
thresh = cv2.bitwise_and(im, im, mask=hsv_bool)
Text(0.5, 1.0, 'masked with original')
What can we do with a thresholded (binary) image?¶
- If we are able to create a binary image with "only" our object of interest, we can calculate the centroid of the nonzero pixels, and assume that this is our object's centroid.
- This is a simplified version of a "blob detection" algorithm. You can learn about blob detection in more detail here
- if there are $N$ white pixels in an image, the image centroid's x-location can be computed as $$ c_x = \frac{1}{N} \sum_{j=1}^m \sum_{i=1}^n j*I[i,j]/255$$
- if there are $N$ white pixels in an image, the image centroid's y-location can be computed as $$ c_y = \frac{1}{N} \sum_{j=1}^m \sum_{i=1}^n i*I[i,j]/255$$
What can we do with a thresholded (binary) image?¶
def findCxCy(I):
cxsum = 0
cysum = 0
nWhite = sum(I)/255
for i in range(0,rows):
for j in range(0,cols):
cxsum+=j*(I[i,j]/255)
cysum+=i*(I[i,j]/255)
cx = cxsum/(nWhite)
cy = cysum/(nWhite)
return int(cx),int(cy)
Computing the Centroid of a Binarized Image¶
#use the function we just wrote:
cx,cy = findCxCy(hsv_bool)
print("cx,cy: "+str(cx)+","+str(cy))
#draw a red circle at the computed threshold to validate
cv2.circle(im, (cx, cy), 10, (255, 0, 0), -1)
cv2.putText(im, "Ball Centroid", (cx - 25, cy - 25),cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 2)
imshow(im)
cx,cy: 166,372
<matplotlib.image.AxesImage at 0x7fcb332b2f10>
Finding Centroids using opencv's "image moments"¶
OpenCV calculates the "moments of area" of an image using the cv2.moments function. This is a faster way to do the same thing!
# calculate moments of binary image
M = cv2.moments(hsv_bool)
# calculate x,y coordinate of center
cX = int(M["m10"] / M["m00"])
cY = int(M["m01"] / M["m00"])
print(cX,cY)
166 372
Class Exercise¶
In the Week12_InClass world, the Romi is equipped with a camera. Inside the controller file, you are already presented with the methodology to read camera frames in Webots.
- Allow the Romi to explore using an explore FSM unless it sees a ball.
- Once a ball is in frame, use a feedback controller to "aim" (angle in yaw) the Romi so that the center of the ball is in the center of the image.